- 
        Couldn't load subscription status. 
- Fork 25.6k
Skip search shards with INDEX_REFRESH_BLOCK #129132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip search shards with INDEX_REFRESH_BLOCK #129132
Conversation
| Pinging @elastic/es-search-foundations (Team:Search Foundations) | 
998cdd5    to
    1ecc447      
    Compare
  
    e233cc7    to
    17706e2      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. The search part should be reviewed by the ES Search team.
        
          
                server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | var addIndexBlockRequest = new AddIndexBlockRequest(IndexMetadata.APIBlock.REFRESH, "test"); | ||
| client().execute(TransportAddIndexBlockAction.TYPE, addIndexBlockRequest).actionGet(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refresh block should be added automatically to newly created indices as long as they have replicas and the "use refresh block" setting is enabled in the node setting. We should remove the ability to add the refresh block through the Add Index Block API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @tlrx!
I was hoping to test this change outside of the context of Serverless. But I agree it's not appropriate to add the refresh block to that API for testing purposes only, so I will see if I can construct the scenario in some other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I was able to get the setup I was looking for by adding the block directly to cluster state in the tests.
| assertHitCount(prepareSearch().setQuery(QueryBuilders.matchAllQuery()), 0); | ||
| } | ||
|  | ||
| public void testSearchMultipleIndicesEachWithAnIndexRefreshBlock() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be folded into a single test, where one or more indices are randomly created, most of some with replicas but other without replicas, and then allocate zero or more search shards and check the expected results, finally assigning all search shards and check the results again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've folded this into a single test with some additional randomization. My goal is to keep the integration tests in the Serverless PR, so I'll add the test scenario you're proposing there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass on the search related side of things and left a few questions and comments.
        
          
                server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                server/src/main/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhase.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      …luster state in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I know nothing about search shard iterators :)
| ClusterService clusterService = internalCluster().getInstance(ClusterService.class, dataNode.getName()); | ||
| ClusterState currentState = clusterService.state(); | ||
| ClusterState newState = ClusterState.builder(currentState).blocks(blocksBuilder).build(); | ||
| setState(clusterService, newState); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is not intended to be used in integration test as it overrides the current data node cluster state.
For testing the INDEX_REFRESH_BLOCK I think it makes sense to only have unit tests in stateful elasticsearch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @tlrx, can you explain more the risks of doing this?
I like having fine-grained control over blocks so I can write tests that block some indices and allow others in one search - I think this is critical to test. If I can only 'set' the block by controlling active search nodes (like I do in the other PR), I can't think of a way to achieve what I want.
        
          
                server/src/main/java/org/elasticsearch/action/search/SearchShardIterator.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                server/src/main/java/org/elasticsearch/action/search/CanMatchPreFilterSearchPhase.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                server/src/main/java/org/elasticsearch/action/search/SearchShardIterator.java
          
            Show resolved
            Hide resolved
        
              
          
                server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount; | ||
| import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertResponse; | ||
|  | ||
| public class SearchWithIndexBlocksIT extends ESIntegTestCase { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to have ESQL test for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tracking that to be a followup task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a small question, I like how small the change has become!
        
          
                server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | ❌ Author of the following commits did not sign a Contributor Agreement: Please, read and sign the above mentioned agreement if you want to contribute to this project | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple more questions. LGTM though, no need to wait further.
|  | ||
| static String[] ignoreBlockedIndices(ProjectState projectState, String[] concreteIndices) { | ||
| // optimization: mostly we do not have any blocks so there's no point in the expensive per-index checking | ||
| boolean hasIndexBlocks = projectState.blocks().indices(projectState.projectId()).isEmpty() == false; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: did we have a chance to incorporate this logic in buildPerIndexOriginalIndices perhaps, where we already look at blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to filter concreteIndices as it's used in multiple places. However, it may be possible to move the block checks in buildPerIndexOriginalIndices to ignoreBlockedIndices. I will have to verify, then if so, I'll follow up with the change.
| Set<ResolvedExpression> indicesAndAliases, | ||
| String[] concreteIndices | ||
| ) { | ||
| concreteIndices = ignoreBlockedIndices(projectState, concreteIndices); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that search shards will inherit the new behaviour as it calls getLocalShardsIterator? Other API that we need to update to look at this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've added a test to verify search shards respects the block.
| } | ||
| } | ||
|  | ||
| public void testIgnoreBlockedIndices() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe specify which block?
| assertHitCount(prepareSearch().setQuery(QueryBuilders.matchAllQuery()), expectedHits); | ||
| } | ||
|  | ||
| public void testOpenPITOnIndicesWithIndexRefreshBlocks() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was your conclusion on open PIT? We do the same filtering, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we filter just the same - that's verified in this test by asserting the hit count does not include docs from blocked indices.
…king * upstream/main: (100 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...
…-tracking * upstream/main: (44 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...
#117543 introduced a ClusterBlock which is applied to new indices in Serverless which do not yet have search shards up. We should skip searches for indices with this block in order to avoid meaningless 503s.
Edit: I've changed approaches to this "skipping" logic and wanted to document some reasoning:
I considered two options:
Option 1 would result in a search response with
{"_shards": { "skipped":= the number of shards we skipped due to an index refresh block. Option 2 pretends those shards don't exist, so{"_shards": { "skipped": 0, "total":< total in option 1.I've decided to go with option 2. The reason is that this cluster block is only active when the index was just created and is still in a red state. I could not think of a reason that a user would need to be aware of the number of skipped shards in this case (a user curious about this detail would likely be able to deduce what's going on anyway). Furthermore, it simplifies the code change (see 86c9a5d).